A Complex Networks Approach for Data Clustering

نویسندگان

  • Francisco Aparecido Rodrigues
  • Guilherme Ferraz de Arruda
  • Luciano da Fontoura Costa
چکیده

Many methods have been developed for data clustering, such as k-means, expectation maximization and algorithms based on graph theory. In this latter case, graphs are generally constructed by taking into account the Euclidian distance as a similarity measure, and partitioned using spectral methods. However, these methods are not accurate when the clusters are not well separated. In addition, it is not possible to automatically determine the number of clusters. These limitations can be overcome by taking into account network community identification algorithms. In this work, we propose a methodology for data clustering based on complex networks theory. We compare different metrics for quantifying the similarity between objects and take into account three community finding techniques. This approach is applied to two real-world databases and to two sets of artificially generated data. By comparing our method with traditional clustering approaches, we verify that the proximity measures given by the Chebyshev and Manhattan distances are the most suitable metrics to quantify the similarity between objects. In addition, the community identification method based on the greedy optimization provides the smallest misclassification rates.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bank efficiency evaluation using a neural network-DEA method

In the present time, evaluating the performance of banks is one of the important subjects for societies and the bank managers who want to expand the scope of their operation. One of the non-parametric approaches for evaluating efficiency is data envelopment analysis(DEA). By a mathematical programming model, DEA provides an estimation of efficiency surfaces. A major problem faced by DEA is that...

متن کامل

A Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach

In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...

متن کامل

A Novel Ensemble Approach for Anomaly Detection in Wireless Sensor Networks Using Time-overlapped Sliding Windows

One of the most important issues concerning the sensor data in the Wireless Sensor Networks (WSNs) is the unexpected data which are acquired from the sensors. Today, there are numerous approaches for detecting anomalies in the WSNs, most of which are based on machine learning methods. In this research, we present a heuristic method based on the concept of “ensemble of classifiers” of data minin...

متن کامل

An Energy Efficient Clustering Method using Bat Algorithm and Mobile Sink in Wireless Sensor Networks

Wireless sensor networks (WSNs) consist of sensor nodes with limited energy. Energy efficiency is an important issue in WSNs as the sensor nodes are deployed in rugged and non-care areas and consume a lot of energy to send data to the central station or sink if they want to communicate directly with the sink. Recently, the IEEE 802.15.4 protocol is employed as a low-power, low-cost, and low rat...

متن کامل

Multi-layer Clustering Topology Design in Densely Deployed Wireless Sensor Network using Evolutionary Algorithms

Due to the resource constraint and dynamic parameters, reducing energy consumption became the most important issues of wireless sensor networks topology design. All proposed hierarchy methods cluster a WSN in different cluster layers in one step of evolutionary algorithm usage with complicated parameters which may lead to reducing efficiency and performance. In fact, in WSNs topology, increasin...

متن کامل

Outlier Detection in Wireless Sensor Networks Using Distributed Principal Component Analysis

Detecting anomalies is an important challenge for intrusion detection and fault diagnosis in wireless sensor networks (WSNs). To address the problem of outlier detection in wireless sensor networks, in this paper we present a PCA-based centralized approach and a DPCA-based distributed energy-efficient approach for detecting outliers in sensed data in a WSN. The outliers in sensed data can be ca...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1101.5141  شماره 

صفحات  -

تاریخ انتشار 2011